oblique tree
Alternating optimization of decision trees, with application to learning sparse oblique trees
Learning a decision tree from data is a difficult optimization problem. The most widespread algorithm in practice, dating to the 1980s, is based on a greedy growth of the tree structure by recursively splitting nodes, and possibly pruning back the final tree. The parameters (decision function) of an internal node are approximately estimated by minimizing an impurity measure. We give an algorithm that, given an input tree (its structure and the parameter values at its nodes), produces a new tree with the same or smaller structure but new parameter values that provably lower or leave unchanged the misclassification error. This can be applied to both axis-aligned and oblique trees and our experiments show it consistently outperforms various other algorithms while being highly scalable to large datasets and trees. Further, the same algorithm can handle a sparsity penalty, so it can learn sparse oblique trees, having a structure that is a subset of the original tree and few nonzero parameters. This combines the best of axis-aligned and oblique trees: flexibility to model correlated data, low generalization error, fast inference and interpretable nodes that involve only a few features in their decision.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (4 more...)
Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes
's leaf nodes to form Given the definition of our attention in Eq. 9 in the main text, the highest polynomial order is Before providing the proof of Theorem 4, we establish Lemma 1 as its foundation. We follow the principle of Y an et al's work [ Figure 1, we consider two kinds of value functions, i.e., In P AM-Net, we set the number of levels to 2. A grid search is performed over different configurations We conduct grid searches on the dropout rate over {0, 0.1, 0.2} and the initial
- North America > United States > California > Merced County > Merced (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (5 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (8 more...)
- North America > United States > Connecticut (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- (2 more...)